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Abstract 

This paper is devoted to the study of vector valued reproducing 
kernel Hilbert spaces. We focus on two aspects: vector valued feature 
maps and universal kernels. In particular we characterize the structure 
'^ I of translation invariant kernels on abelian groups and we relate it to 

J2 ' the universality problem. 

1 Introduction 
> 

ly-s ' In learning theory, reproducing kernel Hilbert spaces (RKHS) are an impor- 

VO ■ tant tool for designing learning algorithms, see for example [8], [29l [31] and 

the book [9]. In the usual setting the elements of the RKHS are scalar func- 
tions. The mathematical theory for scalar RKHS has been established in the 

00 ! seminal paper [Ij. For a standard reference see the book [23]. 

^ [ In machine learning there is an increasing interest for vector valued learn- 

ing algorithms, see [201 [121 II]- In this framework, the basic object is a Hilbert 

^ . space of functions / from a set X into a normed vector space y with the 

H ! property that, for any x G X, ||/(a;)|| < C^ ||/|| for a positive constant Cx 
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independent of /. 

The theory of vector valued RKHS has been completely worked out in the 
seminal paper [27], devoted to the characterization of the Hilbert spaces that 
are continuously embedded into a locally convex topological vector space, see 
|23j . In the case y is itself a Hilbert space, the theory can be simplified as 
shown in [211 El E]- As in the scalar case, a RKHS is completely characterize 
by a map K from X x X into the space of bounded operators on y such that 

N 

for any xi, . . . ,xn in X and yi, . . . ,yN in y. Such a map is called a 3^- 
reproducing kernel and the corresponding RKHS is denoted by Hk- 

This paper focuses on three aspects of particular interest in vector valued 
learning problems: 

• vector valued feature maps; 

• universal reproducing kernels; 

• translation invariant reproducing kernels. 

The feature map approach is the standard way in which scalar RKHS are pre- 
sented in learning theory, see for example [2S]- A feature map is a function 
mapping the input space X into an arbitrary Hilbert space Ti in such a way 
that 7i can be identified with a unique RKHS. Conversely, any RKHS can be 
realized as a closed subspace of a concrete Hilbert space, called feature space, 
by means of a suitable feature map - typical examples of feature spaces are 
i'^ and L^(X, /i) for some measure /x. 

The concept of feature map is extended to the vector valued setting in [HI E] , 
where a feature map is defined as a function from X into the space of bounded 
operators between y and the feature space H. 

In the first part of our paper. Section [3] shows that sum, product and compo- 
sition with maps of RKHS can be easily described by suitable feature maps. 
In particular we give an elementary proof of Schur lemma about the product 
of a scalar kernel with a vector valued kernel. Moreover, we present several 
examples of vector valued RKHS, most of them considered in [221 E]- For 
each one of them we exhibit a nice feature space. This allows to describe the 
impact of these examples on some learning algorithms, like the regularized 
least-squares [T3] . 

In the second part of the paper. Section H] discusses the problem of char- 
acterizing universal kernels. We say that a ^-reproducing kernel is universal 



if the corresponding RKHS Tix is dense in L^(X, /i;3^) for any probability 
measure /x on the input space X. This definition is motivated observing that 
in learning theory the goal is to approximate a target function /* by means 
of a prediction function /„ G ?ix, depending on the data, in such a way 
the distance between /* and /„ goes to zero when the number of data n 
goes to infinity. In learning theory the "right" distance is given by the norm 
in L^(X, yu;3^), where /i is the (unknown) probability distribution modeling 
the sample of the input data, see [8]. The possibility of learning any target 
function /* by means of functions in TIk is precisely the density of Tix in 
L^(X, /i;3^). Since the probability measure /x is unknown, we require that 
the above property holds for any choice of /x - compare with the definition of 
universal consistency for a learning algorithm [TH] . Under the condition that 
the elements of Tix are continuous functions vanishing at infinity, we prove 
that universality oiT-Cx is equivalent to require that Hk is dense in Co{X; y), 
the Banach space of continuous functions vanishing at infinity with the uni- 
form norm. If X is compact and 7Y = C, the density of Tix in Cq{X; y) is 
precisely the definition of universality given in [30113^ . For arbitrary X and 
3^, another definition of universality is suggested in [3] under the assumption 
that the elements of Tix are continuous functions. We show that this last 
notion is equivalent to require that TCk is dense in L'^{X, fj,; y) for any prob- 
ability measure fi with compact support, or that TCk is dense in C(X;3^), 
the space of continuous functions with the compact-open topology. If X is 
not compact, the two definitions of universality are not equivalent, as we 
show in two examples. To avoid confusion we refer to the second notion as 
compact-universality. 

We characterize both universality and compact-universality in terms of the 
injectivity of the integral operator on L^(X, /x; y) whose kernel is the repro- 
ducing kernel K. For compact-universal kernels, this result is presented in a 
slightly different form in [5J - compare Theorem [2] below with Theorem 11 of 
[5] . However, our statement of the theorem does not require a direct use of 
vector valued measures, our proof is simpler and it is based on the fact that 
any bounded linear functional T on Cq{X; y) is of the form 

nf)= I (/(x),Mx))d/i(x), 

Jx 

where /i is a probability measure and /i is a bounded measurable function 
from X to 3^ - see Appendix \^ Notice that, though in learning theory 
the main issue is the density of the RKHS Tix in L'^{X,fi; y), however, our 
results hold if, in the definition of universal kernels, we replace L'^{X,fi;y) 
with LP{X,fi;y) for any 1 < p < oo. In particular, we show that Tix is 
dense in Co(X;3^) if and only if there exists 1 < p < oo such that Hk is 



dense in Lp(X, /i;3^) for any probability measure /x. In that case, TIk is 
dense in L^(X, /i; 3^) for any 1 < q < oo. 

In the third part of the paper, under the assumption that X is a group, 
SectionOstudies translation invariant reproducing kernels, that is, the kernels 
such that K{x,t) = Ke{t~^x) for some operator valued function iiTg : X — >■ 
jC{y) of completely positive type. In particular, we show that any translation 
invariant kernel is of the form 

K{x,t) = An.^nA* 

for some unitary representation vr of X acting on a Hilbert space 7i, and a 
bounded operator A : H ^ y. If X is an abelian group, SNAG theorem [16] 
provides a more explicit description of the reproducing kernel K, namely 

Kix,t)= [xit-x)dQix), 
Jx 

where X is the dual group and Q is a positive operator valued measure 
on X. The above equation is precisely the content of Bochner theorem for 
operator valued functions of positive type [21 [IS]. In particular, we show that 
the corresponding RKHS TIk can be always realized as a closed subspace of 
L^(X,z>, y) where z> is a suitable positive measure on X. In this setting, 
we give a sufficient condition ensuring that a translation invariant kernel is 
universal. This condition is also necessary if X is compact or 3^ = C For 
scalar kernels and compact-universality this result is given in |22]. We end 
the paper by discussing in Section [6] the universality of some of the examples 
introduced in Section [31 

2 Background 

In this section we set the main notations and we recall some basic facts about 
vector valued reproducing kernels. 

2.1 Notations and assumptions 

In the following we fix a locally compact second countable topological space 
X and a complex separable Hilbert space 3^, whose norm and scalar product 
are denoted by ||-|| and (■, ■) respectively. Local compactness of X is needed in 
order to prove Theorem [7] in the appendix, which is at the root of Theorem [H 
The separability of X and 3^ will avoid some problems in measure theory. 
All these assumptions are always satisfied in learning theory. 



We denote by JF(X; 3^) the vector space of functions / : X — ^ 3^, by 
C{X] y) the subspace of continuous functions, and by Co{X; y) the subspace 
of continuous functions vanishing at infinity. If 3^ = C, we set C{X) = 
C{X; C) and Co{X) = Co{X, C). If X is compact, Co(X; 3^) = C{X; y). 
We regard C{X; y) as a locally convex topological vector space by endowing 
it with the compact-open topologjo and Co(X; y) as a Banach space with 
respect to the uniform norm \\f\\^ = max^-gx ||/(2;)||- 

Let B{X) be the Borel a-algebra of X. By a measure on X we mean 
a (j-additive map /x : 13{X) — > [0, +cxd] which is finite on compact setqj. 
We say that /i is a probability measure if fi{X) = 1. For 1 < p < cxd, 
LP{X,fi;y) denotes the Banach space of (equivalence classes of) measur- 
ablqfl functions f : X ^ y such that ||/||^ is /x-integrable, with norm 

ll/llp ~ (/x ll^('^)ll^'^''^(^)) ^- If P = 2 we denote the scalar product in 
L'^{X, fi,y) by (■,-)2- For p = oo, L°°{X,fi;y) is the Banach space of //- 
essentially bounded measurable functions / : X ^ 3^ with norm ||/|Loo ~ 
/i-esssup^.gxll/(a;)||- 

If /i is a probability measure, clearly 

Co(X;3^) C LP{X,fi;y) C L«(X,/i;3^) 

for all 1 < g < p < oo, each inclusion being continuous. Moreover, since X 
is locally compact and second countable, Co(X; 3^) is dense in L^^X, /x; 3^) for 
any 1 < p < oo. 

If H is an arbitrary (complex) Hilbert space we denote its scalar product 
by (■, ■)^ and its norm by IHI^.^. When H' is another Hilbert space, we denote 
by C(H; 7i') the Banach space of bounded operators from Ti to Ti' endowed 
with the uniform norm. In the case Ti = H', we set C{T-C) = CiTi] Ti). 
Given wi,W2 G 7i, we let wi ® W2 be the rank one operator 



{wi^W2)v = {v,W2)t^wi V eH. 



2.2 Vector valued reproducing kernels 

We briefly recall the main properties of vector valued reproducing kernel 
Hilbert spaces. Given X and y as above, a map i^ : X x X — > ^{y) is 



^This is the topology of uniform convergence on compact subsets defined by the family 
of seminorms Ij/lj^ — maxxez ||/(2;)|j for Z varying over the compact subsets in X. 

^Since X is locally compact second countable, then /i is both inner and outer regular. 

^Since y is separable, measurability is equivalent to the fact that (/(•), y) is measurable 
for all y & y. 



called a y -reproducing kernel if 

N 

^ {K{xi,Xj)yj,yi) > 

for any Xi, . . . , Xa? in X, yi, . . . , ?/7v in 3^ and A^ > 1. Given x E X, K^ : y -^ 
JF(X; 3^) denotes the linear operator whose action on a vector y G 3^ is the 
function Kxy G J-'{X] y) defined by 

{K,ym=K{t,x)y t e X. (1) 

Given a 3^-reproducing kernel K, there is a unique Hilbert space Hk C J^(X; 3^) 
satisfying 

K^eC{y,nK) xex (2) 

/(x) = K*J x^x, feHK, (3) 

where K* : 7Y/^ -^ 3^ is the adjoint of K^, see Proposition 2.1 of [U]. The 
space Hk is called the reproducing kernel Hilbert space associated with K, 
the corresponding scalar product and norm are denoted by (■, ■)j^ and \\-\\f^, 
respectively. As a consequence of ([3]), we have that 

K{x,t) = K*Kt x.teX 

Hk = span {K^^y \x e X,y ey} . 

As discussed in the introduction, the space Hk can be realized as a closed 
subspace of some arbitrary Hilbert space by means of a suitable feature map, 
as shown by the next result, see Proposition 2.4 of [6]. 

Proposition 1. Let Ti he a Hilbert space and 7 : X — > B{y; H). Then the 
operator W : H — > ^{X; y) defined by 

{Wu){x) = -f*u, ueH,xeX, (4) 

is a partial isometry from Ti onto the reproducing kernel Hilbert space Tix 
with reproducing kernel 

K{x,t)='y:'yt, x,teX. (5) 

Moreover, W*W is the orthogonal projection onto 

ker W~^ = span {^y^y | x G X, y G 3^} , 

and 

\^ = mi{\\u\\^\uen, Wu = f}. 
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The map 7 is usually called the feature map, W the feature operator and 
H the feature space. Since W is an isometry from ker W^ onto TCk, the map 
W allows us to identify Hk with the closed subspace ker W^ of H. With a 
mild abuse of notation, we say that Tix is embedded into Ti by means of the 
feature operator W. 

Comparing (jl]) with ([S]), we notice that any RKHS Hk admits a trivial feature 
map, namely 'j^ = K^- In this case the feature operator is the identity. 
Conversely, if 7i is a Hilbert space of functions from X to 3^ such that ||/|| < 
Cx 11/11-^ for some positive constant C^, then there exists a bounded operator 
'~ix '■ y -^ H such that f[x) = 7*/. Hence, the above proposition implies 
that 7Y is a RKHS with kernel given by ([5]) and that the feature operator is 
the identity. 

2.3 Mercer and Co-kernels 

In this paper, we mainly focus on reproducing kernel Hilbert spaces, whose 
elements are continuous functions. In particular we study the following two 
classes of reproducing kernels. 

Definition 1. A reproducing kernel K : X x X ^ ^{y) is called 

(i) Mercer provided that Hk is a subspace of C{X; y); 

(ii) Co provided that Hk is a subspace of Co{X;y). 

The choice of C{X; y) and Co{X; y) is motivated in Section H] where we 
discuss the universality problem. 

The following proposition directly characterizes Mercer and Co-kernels in 
terms of properties of the kernels. 

Proposition 2. Let K be a reproducing kernel. 

(i) The kernel K is Mercer iff the function x 1 — > ||i^(x,x)|| is locally 
bounded and K^y G C{X;y) for all x E X and y E y. 

(ii) The kernel K is Cq iff the function x 1 — > \\K{x,x)\\ is bounded and 
K^y e Co(X; 3^) for all X e X and y ey. 

If K is a Mercer kernel, the inclusion Hk ^^ C(X; 3^) is continuous. If K is 
a Co-kernel the inclusion Hk "-^ Co(X;3^) is continuous. In both cases, the 
space Hk is separable. 



Proof. We prove only (ii), since the other proof is similar - see Proposition 5.1 
of [H]. If Ti-K C Co{X]y), it is clear that K^y is an element of Co(X;y). 
Moreover, since \\K*f\\ = \\f{x)\\ < \\f \\^ Vf E Hr, hy the principle of 
uniform boundedness there exists M < oo such that \\K*\\ < M for all x. 
Therefore, \\K{x,x)\\ = \\K*f < M^ for all x. 

Conversely, assume that the function x \ — > \\K{x,x)\\ is bounded and K^y G 
Co{X; y). Given / G Hk, we have 



\\f{x)\\<\\f\\^\\K{x,x)\\'/'<M 



K 



In particular, convergence in Tix implies uniform convergence, so that the 
closure (in TIk) of the hnear span of {K^y | x G X, y G 3^} is contained in 

Co{X-y),t.e. HK^Coix-y). 

The continuity of the inclusion of Tix in Co(X;3^) follows from ||/||^ < 
M 11/11^ ,. Finally TIk is separable by Corollary 5.2 of [6]. D 

If Tix is defined by means of a feature map 7, the above characterization 
can be expressed in terms of 7, as shown by the following result. 

Corollary 1. With the notations of Proposition U\ the following conditions 
are equivalent. 

(a) The kernel K is Mercer [resp. Co]. 

(b) There is a total set S in H such that W{S) C C{X; y) [resp. W{S) C 
Co{X; y)] and the function x 1 — > \\lx\\ is locally bounded [resp. bounded\. 

Proof. We give the proof only in the case of a Co-kernel, the other case being 
simpler. Suppose hence (a) holds true, i.e. Hk C Co, then W{S) C ran W = 
Hk C Co(X; 3^) for all subset S oiH. Moreover, ||7^f = ||i^(x,x)|| < M by 
item (ii) of Proposition [21 Conversely, if condition (b) holds, we have that 
for all X G X and u ETi 

\\{Wu){x)\\ = \\K:{Wu)\\ < \\K:\\ \\W\\ \\u[\^ < \\K{x,x)\\^ \\u\\^ < M^ [\u\\^ , 

where \\W\\ < 1 being W a partial isometry. Then W maps Ti into the space 
of bounded functions and W is continuous from Ti onto TIk endowed with 
the uniform norm. Since W{S) C Co(X;3^) and Co(X;3^) is complete, then 

nK^Co{x-y). u 



2.4 Mercer theorem 

For a Mercer kernel K, there is a canonical feature map, based on Mercer 
theorem, which relates the spectral properties of the integral operator with 
kernel K, and the structure of the corresponding reproducing kernel Hilbert 
space. This result will be also used in the examples. 

To state this result for vector valued reproducing kernels, we need some 
preliminary facts. First of all, if /T is a Mercer kernel and /i is a probability 
measure on X, the space Tix is a subspace of L^(X, /i;3^), provided that 
||i^(a;, x)|| is bounded on the support of /i. This last condition is always 
satisfied if fC is a Co-kernel or if /i has compact support. If Tix is a subspace 
of -L^(X, /i; 3^), we denote the canonical inclusion by 

l,■.HK^L\x,^Ji■y). 

Next lemma states some properties of i^ and its proof is a consequence of 
Propositions 3.3, 4.4 and 4.8 of ^. 

Proposition 3. Let K he a Mercer kernel and fi a probability measure such 
that K is bounded on the support of fi. The inclusion i^ is a bounded operator, 
its adjoint z* : L'^{X,^; y) — > Hk is given by 

(2;/)(x)= [ Kix,t)fit)dfx{t), 
Jx 

where the integral converges in norm, and the composition i^i* = L^ is the 
integral operator on L'^{X,fi; y) with kernel K 



{LJ){x)= / K{x,t)f{t)dfi{t). 



X 



In particular, if K{x,x) is a compact operator for all x G X, then Lx is a 
compact operator. 

The fact that Lx is a compact operator implies that there is a family 
{fi)i&i of eigenvectors in C{X; y) and a family (o"j)jg7 of eigenvalues in ]0, oo[ 
such that {fi)iei is an orthonormal basis of ker L^ = ran L^ and 

Lf^fi = aifi. (6) 

With this notation we are ready to state Mercer Theorem for vector valued 
kernels. Its proof is consequence of Proposition 6.1 and Theorem 6.3 of [6]. 



Proposition 4. Let fi be a probability measure with supp fi = X . Suppose 
K is a Mercer kernel such that sup^gj^^^ ||i^(a;, x)|| < oo, and K{x,x) is a 
compact operator "ix G X. With the notation o/(0), we have that 

nK = {fe C(X;3^) nkerL/ | ^ ^Mk^ < 00} (7) 






{f^9)K = Yl 



{f,ft)2{fi^9) 



2 






K{x,t) = Y,^,u{x)®m (9) 

iG/ 

where the last series converges in the strong operator topology of C{y). 

Equations (JTj) and ([H]) imply tliat {y^ifi)i^i is an ortlionormal basis in 
TCk- In particular the vectors ^/aif^ are ^2-linearly independent in JF(X; y), 
namely, if (cj)jg/ is a family such that ^jgj |cjp < 00 and J2iei Ci^/o'ifi{x) = 
for all X & X, then Cj = for all i & I. 

As said at the beginning of Section 12.41 Proposition H] gives a feature 
operator, which is often used in learning theory. 



Example 1. With the assumptions and notations of Proposition \^ the 



re- 



producing kernel Hilbert space Hk is unitarily equivalent to ker L^ = ran L^ 
by means of the feature operator 

iWf)ix) = J2 V^^fii^) if, fi)2 = {Llf){x) , / G L\X, /i; 3^) . (10) 

i&I 

Proof. Given x G X, define 

-i^-.y ^ L'^{X, /i; 3^) 7^y = ^ ^ (y, /.(x)) fi, 

i€l 

which is well defined since {fi)i^i is orthonormal family of continuous func- 
tions and © ensures that Ylii(^i^i\ iVyfii^)) P < 00. Using IQ again, one 
checks that 7*7^ = K{x,t). The fact that feature operator is given by flTUj) 
is clear by definition of 7^. Since kerVT = kerL^, VT is a unitary operator 
from kerL^"*" onto Hk- n 

2.5 Trivial examples 

We give two examples of trivial vector valued kernels. 
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Example 2. Let B G C{y) he a positive operator and define K{x, t) = B for 
all x,t E X , then K is a y -reproducing kernel, TCk is unitarily equivalent to 
ker B"*" = ran B by means of the feature operator 

{Wy){x) = B^y x e X, ye keiB^. 

The kernel K is of Mercer type and it is a Co-kernel if and only if X is 
compact. 

Proof. Apply Proposition [T] with Ti = keri?-*- and j^ = B^. Since B is 
injective on Ti, then W is unitary. The claims about the continuity are 
clear. D 



Example 3. Let f : X -^ y, f ^ 0. Define K{x,t) = f{x) ® f{t), then 
K is a reproducing kernel, Tix is unitarily equivalent to C hy means of the 
feature operator 

{Wc){x) = cf{x) xeX, cgC. 

In particular K is Mercer [resp.Col ^/ ^^^ ^^^V "^f f ^ ^{^] 3^) [resp. f G 
CoiX;y)]. 

Proof. Apply Proposition [1] with H = C and 'jxy = {y,f{x))- Since / 7^ 0, 
W is injective. The characterization about Mercer and Cq is trivial. D 

3 Operations with kernels 

In this section we characterize reproducing kernel Hilbert spaces whose kernel 
is defined by algebraic operations, like sum, product and composition. Most 
of the results are well known for scalar kernels, whereas for vector valued ker- 
nels they are consequences of the theory developed in [27] in a more general 
context. We provide a direct and simple proof of these results, based on the 
use of suitable feature maps. In some cases, our approach can be of interest 
also in the scalar case, like, for example, in proving Schur lemma about the 
product of kernels. 

As an application, we present a large supply of examples of vector valued re- 
producing kernels and, for most of them, we realize the corresponding RKHS 
by elegant and simple structures. This characterization will be used to an- 
alyze some learning algorithm, like regularized least-squares, in the vector 
valued setting. 
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3.1 Sum of kernels 

The following result extends to vector valued kernels the relation between 
sum of kernels and sum of the corresponding reproducing kernel Hilbert 
spaces. 

Proposition 5. Denote by I a countable set and let {K^)i^i be a family of 
y -reproducing kernels such that 

y^ {K\x, x)y, y) < oo ^y e y and Vx G X. 

Given x,t E X the series Yliei -^^i^^^) converges to a bounded operator 
K{x,t) in the strong operator topology, and the map K : X x X ^ ^{y) 
defined by 

K{x,t)y = J2K\x,t)y 

is a y -reproducing kernel. The corresponding space Tix is embedded in 
©jg/ ?ixi by means of the feature operator 

W{f){x) = J2 fii^) where f = ®ieifi 
iei 

where the sum converges in norm. 

Moreover, if each K^ is a Mercer kernel [resp. Co-kernel] andx i— *> Xlie/ \\K^{x,x) 

is locally bounded [resp. bounded^, then K is Mercer [resp. Co]. 

Proof. We apply Proposition [TJ Letting Ti = ^i^jTi-K^, we regard each Hk^ 
as a closed subspace of Ti so that any two of them are orthogonal. Given 
X G X, we define the bounded operator 'jx '■ y —^ 'H by •y^ = X^ie/ ^x^ where 
the series converges in the strong operator topology since, given y E y, 

by assumption, see [7]. Given i E I and /» G Hk^, then 

iilfuy) = {fi,Ky)K' = (fi('')^y) 

by reproducing property ([3]), so that 7*/j = fi{x). Since 7* is continuous, for 
any / = ©ig//i. 



{wf){x) = y:f = J2l:f^ = J2f' 



i£l i£l 
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where the series converges in norm. 

Finally, K{x, t)y = YxltV = J2iei(^ty)i{x) = Y.i^i K\x, t)y, that is K{x, t) = 

^^gj-ft'*(x, t) in the strong operator topology. 

The second part is a consequence of Corollary [1] with S = IJie/'^^"- '-' 

As an application, we have the following example. 

Example 4. Let {fi)^i a countable family of functions fi'.X^y such that 
J2iei I {fii.^)i y) P ^'5 finite for all x & X and y E Y . Define K : X x X —^ 
C{y) as 

K{x,t) = J2M^)®W)- 

i&I 

Then, the sum converges in the strong operator topology, K is a reproducing 
kernel and 

nK = {fe nx; y) I fix) = J2 c./^(^), E 1^*1' < °°}- (11) 

iei iei 

In particular {fiji^i is a normalized tight frame in Tix- It is an orthonormal 
basis if and only if {fi)i<^i is i2-linearly independent in J-'{X] y). 



Proof. Apply Proposition El with K^{x,t) = fi{x) ® fi{t), observing that 
TCxi = C by Example [3], so that (BieiTi-Xi — ^2- The feature operator is 
explicitly given by 

W{c){x) = ^Cifi{x) where c = (ci)ig/, ^ Iq^ < cx), 

i€l iel 

SO that ( ITTl) is clear. If (ej)ig/ is the canonical orthonormal basis of £2, then 
Wci = fi and, for any / G Hk, 



I = \\W*f\\l = E I {W*f, e.)„ r = E I (/' f^)K \' , 



i.e. {fi)i(zi is a normalized tight frame in Ti-K- Clearly, it is an orthonormal 
basis if and only if W is unitary, i.e. W is injective. This is precisely the 
condition that {fi)i(zi is ^2-linearly independent in J-'{X; y). D 

Proposition H] shows that any RKHS with a bounded compact Mercer 
kernel can be realized as in the above example, where the functions fi are the 
eigenf unctions (with ||/j||2 = (Ji) of the integral operator L^ with eigenvalues 
CTj > 0, and fj, is any probability measure with supp fj, = X, see (^. 
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3.2 Composition with maps 

We now describe the reproducing kernel Hilbert spaces whose kernel is defined 
in terms of a mother kernel and suitable maps acting either on the input space 
X or on the output space y. The following result characterizes the action of 
a bounded operator on y. 

Proposition 6. Let K be a y -reproducing kernel. Let y' he another Hilbert 
space and w : y ^ y be a bounded operator. Define 

K^:XxX^ C{y') K^{x, t) = wK{x, t)w\ 

then Kyj is a y reproducing kernel and Tix^ is embedded in Tix by means of 

W'-Hk-^Hk^, {Wf){x) = wf{x) xeX. 

Ifw is injective, Tix^ is unitarily equivalent to Tix- Moreover, if K is Mercer 
[resp. Cq], then Ky^ is Mercer [resp. Cq]. 

Proof. Let 7^ : 3^' ^ 'Hk, Ix = K^w* and apply Proposition [1] with H = Hr- 
The feature operator from TCk onto TCk^ is explicitly given by {Wf){x) = 
7*/ = wf{x). If w is injective, then W is unitary. The second claim is 
evident. D 

We now study the action of an arbitrary map on X. 

Proposition 7. Let K be a y -reproducing kernel on X . Let T be another 
locally compact second countable topological space, and "$ : T ^ X . Define 

K^:TxT^ C{y) K^{ti,t2) = ir(^(ti), ^(ts)) ti,t2 G T. 

Then K<i, is a y -reproducing kernel on T, the space TCr^ is unitarily equiv- 
alent to 

span {K^y | x G ran \&} = {/ G Hr \ f{x) = Vx G ran \I^}-^ 

by means of the feature operator 

W-.Hk^Hr, w{fm = f{^{t)) fenR,teT. 

If K is a Mercer kernel and "$ is continuous, then K^ is Mercer. If K is a 
Co-kernel and \E' is continuous and proper, then K^ is Cq. 

Proof. Apply Proposition [T] with H = Hr and, for any t ^ T, 'jt = Ki^(t), 

observing that ker W = {f E TCr | /(x) = Vx G ran \E'}. 

The claims about Mercer and Co-kernels are clear. D 
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In the above proposition observe that ker W^ can be identified with the 
quotient space TCk/^^^W, so that one has also the natural identification 

^i^*^{/|ran* \ f ^ Hk} (12) 

where, the r.h.s. is endowed with the norm 

f|ran*|| = inf{||5'||j^ | 9 G Hk, fl'Iran -^ = /|ran -^j 

As a consequence, we describe the relation between a kernel and its re- 
striction to a subset. 

Corollary 2. Let Xq be a subset of X. Let Kxq be the restriction of K to 
Xq X Xq, then 

If K is Mercer and Xq is locally closed, then Kxq is Mercer. If K is Cq and 
Xq is closed, then Kx^ is Cq. 

Proof. Apply Proposition [7] and identification flT^ . with \E' the canonical 
inclusion of Xq in X. D 

We end this part by describing the reproducing kernel Hilbert space as- 
sociated with the kernel proposed in [5]. 

Proposition 8. Let n be a scalar reproducing kernel on X. Let T be an- 
other locally compact second countable topological space. Let \E'i, . . . , ^rn be 
functions from T to X and define K{ti,t2) as the m x m-matrix 

K{ti,t2)ij = K{^i{tl),^j{t2)) 2,j = l,...,m, ti,t2eT. 

Then K is a C"^ -reproducing kernel on T, the space Tix is embedded in Ti^ 
by means of the feature operator 

If one of'^i, . . ., "^m is surjective, then W is unitary. 
Proof. Apply Proposition [T] with H = H^ and jt '■ C™ —>■ H« 



^Kt 



-ft{yu---,ym) = Xl^*^*iW 



i=l 

so that 74* ((^)i = (^(^i(t)). 

If \&j is surjective for some index i = 1, . . . ,m, the condition (y9(\E'j(t)) = 
for all t e T implies that (p{x) = for all x & X, that is, ip = 0. Hence W is 
injective and, hence, unitary D 
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3.3 Product of kernels 

The following proposition extends Schur lemma about products of reproduc- 
ing kernels to the vector valued case. 

Proposition 9. Let K be a y -kernel and k, a scalar kernel. Define 

{K,K){x,t) = K,{x,t)K{x,t) x,t & X, 

then kK is a y -reproducing kernel and T-C^k is embedded into Ti^ ® Tix by 
means of the feature operator 

W{v ® f){x) = ^{x)f{x) ^en., fe Hk. 

If both K and K are Mercer kernels, so is kK , whereas if 

U^ G Co{X) and K^v eC{X;y) 
sup{k,{x, x), \\K{x, x)\\} < oo and < or (13) 

^^^ [K,eC{X) andK,veCo{X;y) 

then K is a Cq kernel. 

Proof. Let 7i = Ti^ ® 'Hk- Since k is a scalar kernel, k^ G Ti^. Define 

1x -y ^T-ihj means of 7^y = k^ (g) K^y, then 'j*{ip ® /) = ip{x)f{x). First 

claim is a consequence of Proposition [H 

If both K and K are Mercer kernels, clearly kK is Mercer. 

To prove that if flTSl) hold then K is Cq, we apply Corollary [T] with S = 

{(y9 (g) / I y? G Hk, f G Hk}, and observe that 

||7x|| < \\i^x\\k \\Kr,\\ < C, 

by assumption. D 

Based on the above results, we characterize the RKHS whose kernel is 
given in 0. 

Example 5. Let k, be a scalar reproducing kernel and B a positive bounded 
operator on y . Define K : X x X ^* ^iy) o,s 

K{x,t) = K{x,t)B x,t E X 

(i) The map K is a y -reproducing kernel and Hk is unitarily equivalent 
to Hk ® ker B-^ by means of the unitary operator 

W{(f ® y){x) = (p{x)B^y. 
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(ii) If K, is Mercer [resp. Cq], then K is Mercer [resp. Cq\, too. 

(iii) If there is an orthonormal basis {yi)i£i ofkeiB^ such that Byi = atyi 
(so that CTj > for all i e I), then Hk is unitarily equivalent to ©jg/Ti^ 
by means of the unitary operator 

W{®i^iiPi){x) = Y^ ^iipi{x)yi , (14) 

where the series converges in norm. 

Proof. First two items are a consequence of Proposition [9] and Example [21 
We prove item (iii) in two steps. Apply first Proposition [HI with w : y ^ £2, 
{wy)i = {y,yi), so that Hk^ is embedded in TCk, by means of the feature 
operator Vr«,(/) = w o / for all / G TCk- The corresponding ^2-kernel is 
Kw{x,t) = K{x,t)wBw*. By definition of w, the kernel Ky^ is diagonal with 
respect to (ej)ig/, the canonical basis of £2, namely 

K^{x, ^) = ^ criK{x, t)ei (g) el =: ^ K'{x, t), 

i£l i£l 

where the series converges in the strong operator topology. 
Now observe that, for each i E I, ker(a"jei ® ei)-^ = Ccj, so that for item (i) 
of this example, the space TCk^ is unitarily equivalent to TC^ ® Ccj ~ TCk, 
through the feature operator 

W'-.n^^ Hk^ , W\^){x) = ^{x)^e, 

Applying Proposition [S] to the family {K^)i^j, we obtain a unitary operator 

W: ^TC^-^TCk^, Wi®iip^)ix) = ^(/^,(x)v/^ei, 

(the operator W is unitary since ctj > for all i G /, so that W is injective). 
Equation ([H]) is finally obtained letting W = W*W. D 

If in Example [5], 3^ is a RKHS of scalar functions over some set X', then 
there is a particular choice for the operator B, suggested by Example [H 

Example 6. Let X and X' be two locally compact second countable topo- 
logical spaces. Let k, : X x X ^ C and k' : X' x X' ^ C be two scalar 
reproducing kernels on X and X' , respectively. 
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(i) ///' denotes the identity operator on Ti^'? define 

K -.XxX ^ C{n^) K{x, t) = k{x, t)I', 

then K is a Ti,^' -reproducing kernel on X and the corresponding RKHS 
Tix is unitarily equivalent to Ti^ ® T^k' by means of the feature operator 

W : n^(»H^' — >Hk, W{ipi ® ip2) (x) = v^i (x)(^2 . 

(ii) Define k x k' : {X x X') x {X x X') ^ C as 

{k, X k') (x, x'; t, t') = k{x, t)K,'{x' , t'), 

then kxk' is a scalar kernel onXxX' and TY^xk' ^s unitarily equivalent 
to Tix by means of the feature operator 

W{f){x,x') = [fix)] (x') = (/(x),4,),, / G Hk. 

Proof. The first part follows from Example with 3^ = Ti^' and B = I', 
which is injective. The second part is a consequence of Proposition [T] applied 
to 

7 : X X X' — > C{C; Hk) ^ Hk , (x, x') i — > W{k^ ® <,) , 

taking into account the injectivity of W and the equalities 

{W{ipi ® ip2),l{x,x'))j^={Vl ® V2, l^x ® 4') = ^l{x) ((^2, <')^/ 

= {W{vi ® V52)(a;), 4,)^, = W{W{vi ® V52))(x, x'). 

n 

By using PropositionlHon the space X', the above example can be realized 
in an alternative way. 

Example 7. Let X and X' be two locally compact second countable topo- 
logical spaces. Let k, : X x X —>■ C and n' : X' x X' ^ C be two scalar 
Co-reproducing kernels on X and X' , respectively. Let fi' be a probability mea- 
sure on X' with supp fi' = X' and L^/ be the integral operator on L'^{X',fi') 
with kernel k' . Define 



K -XxX ^ C{L'^iX', /i')) K{x, t) = k{x, t)L 



M ' 



then the kernel K is a L'^{X' , fi') -reproducing kernel and the space Tij^ is 
unitarily equivalent to Ti^ ® Tit^i by means of 

W{f®g){x) = f{x)i^.{g) fen^, gen^,, 
where i^i is the inclusion ofTi^r in L^(X', /i). In particular, K is a Co-kernel. 
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Proof. Apply Proposition [U] with K = kI', as in the previous example, and 
w = i^i, which is injective. Clearly K^^ = K, so that Hf^ is unitarily equiva- 
lent to TYk/' • The thesis follows immediately from Example El D 

The above example shows that Tix and Tij^ are the same RKHS, where 
the elements of Tix are regarded as functions from X into Ti^r, whereas the 
elements of H^ are regarded as functions from X into L'^{X', fi'). 

3.4 Application to learning theory 

We end this section considering an application of some of the above examples 
to vector valued regression problems. In learning theory, a popular algorithm 
is the minimization on a RKHS Tix of the empirical error with a penalty term 
proportional to the square of the norm [13j, namely 

r = ^Tgmm(-J2¥-fi^% + >^\\f\\l] ■ (15) 

Here {{x^, y^), ■ ■ ■ , [x^, y"')} is the training set of n input-output pairs {x^, y^) € 
X X Y and A > is the regularization parameter. If the reproducing kernel 
K is as in Example [5], then it can be checked that 



/'(^) = Z1^^(^)^^ 



iei 



where each (p* is given by 



l±\yf-^i,')\' + 



ipl = argmm \ - > \yi - V[xl\- + — ||y?| 



andyf = {y^,yi). 

In many applications y = C™ so that B is a. m x m positive semi-definite 
matrix. The above observation reduces the problem of computing the mini- 
mizer of ( IT5l) to |/| scalar problems, where the cardinality |/| is the rank of 
the matrix B. 

With the choice of K as in Proposition [H], let /* be the minimizer given 
by fllSp . where the n-examples in the training set are the pairs {t^,y^) G 
T X M™. By using the fact that IV is a partial surjective isometry, one can 
check that 

r(t) = (y.^(vi>,(t)),...,^^(vi>^(t)). 
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where </?* is given by 



^^ = argmin \-l^l^\yi- ^(x^)] + A y\\^ 

where yf G M are the components of the output y^ G M™" and xf = \E'j(t^) G X. 
With this choice the problem flTSl) is reduced to a minimization problem on 
the scalar RKHS Ti^- 



4 Universal kernels: main results 

In this section we address the problem of defining and characterizing the 
universality of a kernel K. As pointed out in the introduction, in learning 
theory a necessary condition in order to have universally consistent algo- 
rithms is the assumption that the reproducing kernel Hilbert space Hk is 
dense in L'^{X, /i; y) for any probability measure /z. From this point of view 
next definition is very natural. 

Definition 2. Let K : X x X —* ^{y) be a reproducing kernel. 

(i) A Co-kernel K is called universal if Tix is dense in L'^{X,fi;y) for 
each probability measure fi. 

(ii) A Mercer kernel K is called compact-universal if Tix is dense in 
L'^{X,fi; y) for each probability measure fi with compact support. 

We briefiy comment on the above definitions. In item (i) the assumption 
that the kernel is Cq ensures both that TCk is a subspace of L'^{X,fi;y) 
and that universality is equivalent to the density of Hk is Co{X;y) (see 
Theorem [1]). In item (ii), since /i has compact support, it is enough to 
assume that K is a Mercer kernel in order to have Tix C L^(X, yu; 3^). This 
last property turns out to be equivalent to the definition of universality given 
in 0. 

Clearly a universal kernel is also compact-universal. Conversely, a Co-kernel 
can be compact-universal but not universal, as shown by Examples IHl and fTTl 

Notice that in Definition [2] if we replace L'^{X,fi;y) with LP{X,fi;y) 
for an arbitrary 1 < p < oo, we have in principle a different notion of 
universality. Nevertheless Theorem [T] clarifies that there is no difference. We 
state the results for p = 2, since it is the natural choice in learning theory. 

The following corollary shows that universality is preserved by restriction 
to a subset. 
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Corollary 3. Let Xq be a subset of X . 

(i) // Xq is closed and K is universal, then Kxq is universal. 

(ii) If Xq is locally closed and K is compact-universal, then Kxq is compact- 
universal. 

Proof. We only prove (i). Since Xq is closed, Corollary [2] implies that Kxo 
is a Co-kernel, and a function / belongs to Ti-Xx i^ ^"^^ only if there exists 
g e Ti-K such that / = g\^ . Given a probability measure fi on Xq, let u be 
the probability measure on X, ^{E) = fi{E fl Xq) for any Borel subset E of 
X. By universality of K, Hk is dense in L'^{X, z/, y) ~ L^(Xo, /x, y), where 
the equivalence is given by the restriction from X to Xq, so that Ti-Kx i^ 
dense in L^(Xo,/i, 3^). D 

The converse is clearly not true. Notice that the compact-universal ker- 
nels are precisely the Mercer kernels such that Kxq is universal for any com- 
pact subset Xq of X. 

In the next subsections we discuss separately the two notions of univer- 
sality and then we make a comparison between them. 

4.1 Universality and Co-kernels 

In this section we characterize the universal Co-kernels. First result shows 
that the density of TCk in L^{X, /x; y) for any probability measure /i is equiv- 
alent to the density in Co(X;3^) and that one can replace L^(X, /i;3^) with 
LP{X,n;y),l<p<CK^. 

Theorem 1. Suppose K is a Co-kernel. The following facts are equivalent. 

(a) The kernel K is universal. 

(b) The space Hk is dense in Co(X; 3^). 

(c) There is 1 < p < oo such that Tix is dense in U'{X, ^^y) for all 
probability measures fi on X . 

Proof. Clearly (a) implies (c). Since X is locally compact and second count- 
able, Co(X;3^) is dense in L^(X, /i;3^) where the inclusion is continuous, so 
that (b) implies (a). 

We show that (c) implies (b). Suppose hence that Hk is not dense in 
Co(X;3^). Then, there exists T G Co(X;3^)*, T ^ such that T(/) = 
for all / G Ti-K- By Theorem [TJ there is a probability measure /i on X and 
a function h G L°°(X,/i;y) such that T(/) = J^ {f (x) , h{x)) dfi{x) . Since 
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T ^ 0, then h y^ 0. 

Since /x is a probability measure, his a non-null element in L^/(p~^) (X, fi; y) 

U'{X,ii] yy (where we set 1/0 = oo) such that 

(/(x),/i(x))dMx) = V/g7^k. 



X 

It follows that TIk is not dense in Wi^X, l^'^y)- O 

As a consequence of the previous theorem, we have the following nice 
corollary. 

Corollary 4. Let K he a Cq- kernel. Given 1 < p < q < cxd, the space Tix is 
dense in Lp{X,ii] y) for all probability measures /i if and only if it is dense 
in L'^{X, /i; y) for all probability measures /i. 

The previous result is not trivial. Clearly, if g > p, the space L'^{X, /x; y) 
is always a dense subspace of LP{X,n;y) and the inclusion is continuous. 
Hence, if a RKHS Hk is dense in L'^ {X , fi; y) , then Hk is always dense in 
LP{X, /i; y). However, in general Lp{X, fi] y) is not contained in L'^{X, /x; y), 
so that, if Hk is dense L^i^X, /i; y), the density of Hk in L'^i^, 1^] y) has to 
be proved. Corollary H] shows this result under the assumption that K is Cq. 

Now, we give a characterisation of universality of K in terms of the injec- 
tivity property of the integral operators L^, for /i varying over the probability 
measures on X. 

Theorem 2. Suppose K is a Co-kernel. Then the following facts are equiv- 
alent. 

(a) The kernel K is universal. 

(b) The operator!*^ : L^(X, /i;3^) -^ Hk is an injective operator for all 
probability measures fi on X . 

(c) The integral operator L^ : L^(X, /i;3^) -^ L'^{X,fi;y) is injective for 
all probability measures fi on X . 

The proof is an immediate consequence of Theorem [1] and the next propo- 
sition. 

Proposition 10. Let K be a Mercer kernel and n a fixed probability measure 
on X such that K is bounded on the support of fi. The following facts are 
equivalent. 

(a) The space Hk is dense in L^(X, /i; 3^). 
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(b) The operator z* is injective. 

(c) The integral operator L^ is injective. 

Proof. The space TIk is dense in -L^(X, /i; y) if and only if the range of i^ is 
dense in L^(X, /i;3^). This last condition is equivalent to the injectivity of 
z*, that is, (a) is equivalent to (b). Since L^ = i^i*^ and kerL^ = kerz*, then 
(b) and (c) are equivalent. D 

4.2 Compact-universality 

In this section, we characterize compact-universality of Mercer kernels and 
we show that compact-universality is precisely what is called universality in 
0. 

Next theorem characterizes compact-universality. 

Theorem 3. Suppose K is a Mercer kernel. The following facts are equiva- 
lent. 

(a) The kernel K is compact-universal. 

(b) The space Hk is dense in C{X; y) endowed with compact-open topology. 

(c) There is 1 < p < oo such that Hk is dense in LP{X,fj,]y) for all 
compactly supported probability measures. 

Proof. Clearly (a) implies (c). We prove that (b) implies (a). Indeed, fixed 
a probability measure /i with compact support Z, the fact that Hk is dense 
in C{X;y) implies that HkIz '■~ {f\z I f ^ T^k} is dense in C{Z;y), but 
C{Z; y) is clearly dense in L'^{Z,fi; y) ~ L^(X, /x; y) with continuous injec- 
tion. Hence Hk is dense in L"^ {X , fi; y) . It only remains to prove that (c) 
implies (b). For this, it is enough to prove that Hk\z is dense in C{Z;y) 
with the uniform norm, for all compact subset Z of X. But this is a simple 
consequence of Theorem [1] since Hk\z is clearly dense in U{Z,^; y) for all 
probability measure /i on Z, and C(Z; 3^) = Cq{Z; y). D 

The analog of theorem [2] also holds. 

Theorem 4. Suppose K is a Mercer kernel. Then the following facts are 
equivalent. 

(a) The kernel K is compact-universal. 

(b) The operator i* : Hk —>■ L'^ {^ , l^^] y) is an injective operator for all 
compactly supported probability measures fi on X . 
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(c) The integral operator L^ : L'^{X,fi;y) -^ L'^{X,fi;y) is injective for 
all probability measures fi on X with compact support. 

The proof is a simple consequence of Proposition [TOl 

Clearly universality of a Co-kernel K implies compact-universality. The 
converse is not true as shown by the following example, see also Example [TTl 
The reason of this phenomenon is the fact that Co{X; y) endowed with the 
compact-open topology is not continuously embedded in LP{X,fj,; y). 

Example 8. Let X = Z+, and let £^ be the Hilbert space of square summable 
sequences. Then, i"^ is a RKHS of scalar functions on X with reproducing 
kernel K{i,j) = 6ij, where 6ij is the Kronecker delta. We fix the following 
sequence {fk}k€Z+ in f 

fkij) = 6j^k + e6j^k+i, 

and we let 

Hj^ = f-d span {fk\ke Z+} (16) 

(£^— cl denotes the closure in i"^). Tip^ is also a RKHS of scalar functions 
on X, whose reproducing kernel we denote by K. Since (? <Z Cq {= the 
sequences going to at infinity), i^ is a Co-reproducing kernel. 

For all n G Z,+ , let Z„ = {1, 2 . . .n}. Z„ is compact in X, and every 
compact set Z C X is contained in some Zn- Clearly, 

C{Zn) = span {{fk)\z„ \k<n} , 

hence Tij^ is dense in C{X) with the topology of uniform convergence on 
compact subsets. 

Let fi be the probability measure on X such that /^({j}) = (e — l)e~^ . 
We claim that 7i^ is not dense in L^(X, /i). In fact, let / G L'^{X,fi) be the 
function /(j) = (-1)^'. We have {fkJ)L^x,t,) = fo^ all k. By (^ and 
continuity of the inclusion i"^ "^-^ L'^i^, fi), we see that / is in the orthogonal 
complement of 7i^ in L^(X, yu). The claim then follows. 

A universal kernel is strictly positive definite, but the converse in general 
fails, as shown by the following corollary and example. 

Corollary 5. Suppose K is a compact-universal kernel. Then K is strictly 
positive definite, i.e. for all finite subsets {xi,X2 . . -x^} of X such that Xj 7^ 
^j ^/^ ¥" J ! ^he condition 

N 

Y^ {K{xi, Xj)yj, yi) =0 (yi e y, i = 1 . . .N) 

implies yi = for all i = 1, . . . ,N. 
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Proof. Assume ^^ • -^ {K{xi, Xj)yj, Hi) = for some finite subset {xi, X2 ■ ■ ■ xjy} G 
X, Xi 7^ Xj if i 7^ j, and {yi, 1/2 .. . xat} in 3^. Taking 



/^ = T7 Xl ^=^'' ^^^ ^ = Y1 y^^"^^ ' 



i=l j=l 



we obtain a probability measure /x on X with compact support and a function 

V9 G Iv^(X, yu; 3^) such that 

iV 

= y] (^(a;i, Xj)yj, y,) = N"^ {K{x, y)<^{y), Lp{x)) d^i{y) dix{x) 

= N^ [ {{L^^){x),^{x)) dfi{x) = N^L^^,^)^. 
Jx 

Since L^ is positive and injective by Theorem HI we have ip{xi) = for all 
i = 1, . . . ,N. Since Xi 7^ Xj iii ^ j, then y, = for alH = 1, . . . , A^. D 

The converse of the above corollary fails to be true, as shown by the 
following example. 

Example 9. Lei AT : R x R -^ C 6e the kernel 

J-l 7T{x-t) 

The map K is a scalar Co-kernel, which is strictly positive definite, but not 
universal. 

Proof. We show that it is strictly positive definite. Indeed, let xi, ... a; at G X 
such that Xj 7^ Xj if z 7^ j, Ci, . . . , Cat G C and suppose 

Since p ^-^ | X]j '^j^^'^^^'^P i^ continuous, it follows that | X^jCjC^'^*^*^!^ = 
for all p G [—1, 1]. Observing that the functions fj{t) = e^'^*^^* are linearly 
independent on [—1, 1] since Xj 7^ Xj, it follows that Cj = for all j. Clearly 
i^ is a Co-kernel, but it is not universal (see Example fTTll . D 

In the next remark we show that compact-universality is exactly what is 
called universality in [H]. 



25 



Remark 1. In [5J, a Mercer kernel K is said to be universal if, for each 
compact set Z C X 

C{Z]y) = IHI^-clspan {K{-,x)v\z \xeZ,vey] , (17) 

where H-JI^ — cl denotes the closure in C{Z; y) with the uniform norm topol- 
ogy. This is equivalent to require that TIk is dense C{X; y) with the compact- 
open topology, that is, by Theorem [1] that K is compact-universal. Indeed, 
by definition of the compact-open topology, TIk is dense in C(X; y) if and 
only if 

c{z-y) = \\-\\,~dnK\z (18) 

for all compact Z C X. 

Clearly ( TT71) implies ( TT8l) . Suppose on the other hand that (ITSll holds true. 
Denote with K the restriction oi K to Z ^ Z. Since convergence in 7i^ 
implies uniform convergence we have 

\\-\\z — clspan {K{-, x)v\^ \ x E Z, v E y} ^ 7i^ 

On the other hand, 7i^ = HkIz ^^ ^ linear space of functions (see Corol- 
lary [2]). Hence (^ implies (^. 

5 Translation invariant kernels and univer- 
sality 

In this section we assume that X is a locally compact second countable 
topological group with identity e and we study the reproducing kernels that 
are translation invariant, namely 

K{zx,zt) = K{x,t) for all X, t, 2 G X. (19) 

In particular we characterize all the translation invariant kernels in terms 
of a unitary representation of X acting on an arbitrary Hilbert space Ti 
and an operator A : 7i ^ y. If X is an abelian group, we give a more 
explicit characterization in Theorem and Theorem [T3] provides a sufficient 
condition ensuring that the corresponding reproducing kernel Hilbert space 
is universal. This condition is also necessary if X is compact or 3^ = C. For 
scalar kernels on M'^ our result has been already proved in [22] . 

For a representation vr of X on a vector space V we mean a group ho- 
momorphism from X to the automorphisms of V. In particular, if V^ is a 
Hilbert space, n is unitary if it takes values in the group of unitary operators 



26 



on V. In this framewok the representation is called continuous if n is strongly 
continuous (see [IS])- 

We denote by A the left regular representation of X acting on J^{X; y), 
namely 

(A./)(t) = fix-h) t,xe X, / G J'iX;y). 

We recall that a function F : X — > C{y) is of completely positive type if 

N 

J^{r{xj'xi)yj,y,)>0 (20) 

for all finite sequences {a;j}j=i...iv in X and {yi}i=i...N in 3^- 
The following facts are easy to prove. 

Proposition 11. Let fC : X x X — > C{y) be a reproducing kernel. The 
following conditions are equivalent. 

(a) K is a translation invariant reproducing kernel. 

(b) There is a function K^ : X ^ '^{y) of completely positive type such 
thatK{x,t) = Ke{t^^x). 

If one the above conditions is satisfied, then the representation A leaves in- 
variant Tix, its action on Tix is unitary and 

K{x,t) = K:K-hK, x,tEX (21) 

||i^(a;,a;)|| = ||ire(e)|| xgX (22) 

The notation K^ for the function of completely positive type associated 
with the reproducing kernel K is consistent with the definition given by ([T]) 
since 

{Key){x) = K^{x)y yey, x e X. 

Proof of Proposition [771 Assume (a). Given x, t G X, ([T]) and (1191) give 

K^{t~^x) = K{t'^x,e) = K{x,t). 

Since i^ is a reproducing kernel, K^ is of completely positive type, so that 
(b) holds true. 

Assume (b). Clearly K is a translation invariant reproducing kernel, so that 
(a) holds true. 

Suppose now that K is a. translation invariant reproducing kernel. Ob- 
serve that, given t G X and y E y, 

{KKty){z) = {Kty){x-^z) = K{x-\,t)y = K{z,xt)y = {K,ty){z) x, z G X, 
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that is, XxKt = K^f Moreover 

{Xj:Kt^yi,KKt^y2)K = {Kxt^yi,Kxt2y2) k = {K{xt2,xti)yi,y2) 
= {K{t2,ti)yi,y2) = {Kt^yi, Kt^y2) K . 

This means that A leaves the set {K^y | x G X, |/ e 3^} invariant and its ac- 
tion is unitary. First two claims now follow recalling that {KxV | a; G X, ?/ e 3^} 
is total in TLk- To prove (12T1) observe that 



K{x,t) = KlKt = KKXtKe = KXx^nKe 
for all xA e X. D 



Notice that, if i^ is a translation invariant kernel, (1221) implies that the 
elements of Tix are bounded functions. The following lemma characterizes 
the translation invariant kernels that are Mercer or Co- 

Lemma 1. Let K^, : X ^ y be a function of completely positive type and 
let K he the corresponding translation invariant reproducing kernel. The 
following conditions are equivalent. 

(a) The map K is a Mercer kernel. 

(b) For ally ^y,K,{-)y(^C{X-y). 

(c) The representation A is continuous on Tix- 

Moreover, the map K is a Co-kernel if and only if Ke{-)y G Co{X;y) for all 

yey. 

Proof. The equivalence between (a) and (b) as well as the statement about 
Co-kernel is a consequence of Proposition[2l observing that {Kxy){t) = Ke{x'^t)y 
and (I22D holds. 

Assume that i^ is a Mercer kernel. Since A is a unitary representation and 
the set {Kty \ t & X,y E y} is total in TCk, it is enough to check that for 
any t E X and y E y the function x ^— *> XxKty is continuous at the identity. 
Indeed, observe that 

\\XxKty - Kty\\\ = \\Kxty - Kty\\\ 

= {{K{xt, xt) - K{t, xt) - K{xt, t) + K(t, t)) y, y), 
= ((2i^e(e) - K,{t-'x-h) - K,{t-'xt)) y,y) 

which is continuous at the identity by assumption on K^. Conversely, if A is 
continuous, fl?Il) gives that 

Ke{x)y = K{x, e)y = K*Xx~iKey, 

so that Ke{-)y is continuous. D 
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The following theorem characterizes the translation invariant reproducing 
kernels. 

Proposition 12. Let n be a unitary representation of X acting on a separable 
Hilbert space H and A -.Ti ^> y a bounded operator. Define 

w -.n^j^^x-y), {Wv){x) = Att^-iv . (23) 

W is a unitary map from ker W^ onto the reproducing kernel Hilbert space 
Tix with translation invariant kernel 

K{x, t) = ATT^-itA* x,teX. (24) 

Moreover W intertwines the representations vr and A. Finally W is unitary 
if and only if the only n-invariant closed subspace of ker A is the null space. 

Proof. Define 'jx '■ y ^ 'H as jr,. = ir^A*, so that {Wv){x) = 7*^ = Atc^-iv. 
The claim is now consequence of Proposition [H up the last statement. The 
fact that W intertwines vr with A is trivial. Finally, by Proposition [U VT is 
unitary if and only if is injective. By definition 

keiW = {v eH\ Tc^v G ker A\/x e X}. 

Hence ker Pi^ is a closed subspace of ker A invariant with respect to tt. Con- 
versely any vr-invariant closed subspace of ker A is contained in ker VT. D 

Proposition [TT] and [12] show that any translation invariant kernel is of 
the form K{x, t) = Air^-itA* for some unitary representation vr acting on a 
Hilbert space H and a bounded operator A : H ^ y. In particular, if vr is 
a continuous representation, then i^ is a Mercer kernel and for any Mercer 
kernel vr can be assumed to be continuous and Ti separable. Moreover, the 
reproducing kernel Hilbert space TCk is embedded in TC by the feature oper- 
ator W defined by ( l23l) . Observe that if the representation vr is irreducible 
or if A is injective, then W is unitary. 

If 3^ = C, the operator A is of the form Av = {v, w)^ for some w G 7i, so 
that {Wv){x) = {v,7rxw)y^. This operator is well know in harmonic analysis 
as wavelet operator [T]~ 



Remark 2. Notice that any translation invariant kernel K is the sum of 
translation invariant kernels associated with cyclic representations. Indeed, 
let vr be a unitary representation defining K by means of ( l24j) . Since any 
unitary representation is the direct sum of a family of cyclic representations, 
then H = (Biei'Hi where each Hi is a closed yr-invariant subspace and the 
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action of vr on Tii is cyclic. Denote by Pi the orthogonal projection on ?ij, 
then 

i&I iei 

where the series converges in the strong operator topology and the reproduc- 
ing kernels K^ are K^{x,t) = AjTr^.i^A* where vr* and Ai are the restrictions 
of 71 and A to TCi, respectively. Proposition implies that Hk = '^hzjT^k^- 
For scalar kernels, we can always assume that vr is cyclic itself. Indeed, the 
wavelet operator is {Wv){x) = {v,tTxw)^ for some w & H, so that the as- 
sociated kernel K is determined only by the cyclic subrepresentation of vr 
containing w. 

5.1 Abelian groups 

In this section, we specialize the previous discussion to the case in which 
X is an abelian group. With this assumption, we can give a more explicit 
construction of translation invariant Mercer kernels, which is related to a 
generalization of Bochner theorem for scalar functions of positive type, [21 [T3]. 

We denote the product in X additively and the identity by 0, since the 
main example is R''. We let X be the dual group of X and we denote by dx 
the Haar measure on X. 

Now, we briefly recall the definition of Fourier transform, see for example 
If G L^{X, dx; y), its Fourier transform J^{4>) : X ^ 3^ is given by 



•^(0)(x) = / X{x) (j){x)dx. 
Jx 

We denote by dx the Haar measure on X normalized so that JF extends to 
a unitary operator from L^(X, dx;^) onto L"^ {X , dx', y) ■ If /x is a positive 
measure on X and ip G L^{X, /i; 3^), let J-'{ipfi) : X — > 3^ be given by 



^iVf^)ix)= / Xix)(pix) dfi{x) 



X 



If /i is a complex measurcl on X, we denote J-'{n) = J-'{h\fi\) where |/i| is the 
total variation of /i and h G L^{X, |/i|) is the density of /i with respect to |/i|. 
By general properties of Fourier transform, J-'{(f)) and J-'in) are bounded 
continuous functions on X (actually, J^{4>) G Co(X; 3^)). Moreover, J-'{(p) = 
[respectively, J^{^) = 0] if and only if = in L^{X, dx; y) [resp., /i = 0]. 



That is, a a-additive map ^ : B{X) -^ C. 
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We recall that a positive operator valued measure {POVM) on X with 
values in 3^ is a map Q : B{X) — > -^(3^) such that Q{Z) > for all 
Z e B{X), and 

i 

for every denumerable sequence of disjoint Borel sets {Zi}i where the sum 
converges in the weak operator topology. A positive operator valued measure 
Q is a projection valued measure if QiZ)"^ = 1 for all Z G B{X). li f : X ^ 
C is a bounded measurable function, J^ f{x)^Q{x) is the unique bounded 
operator f{Q) defined by 

f{Q)y, y') = j^ /(x)dQ,y (x) 2/, y' e y, 

where Qy,y' is the complex measure on X given by Qy^yi{Z) = (Q{Z)y,y' 

for all Borel subsets Z. 

Next theorem shows that there is a one to one correspondence between 
translation invariant Mercer kernels on X and positive operator valued mea- 
sures on X. For scalar kernels this result is Bochner theorem [2]. For vector 
valued kernels, it is proved in [TU [TS] under the weaker assumption that Kq 
is a function of positive type, namely that 

TV 

^ dc] {Ko{xi - Xj)y, y) >0 (25) 

for all finite sequences {xi}i=i„,N in X, {ci}i=i,„N in C and y E y. The 
fact that conditions ( l20l) and ( l25l) are equivalent for abelian groups is a 
consequence of jlOl Lemma 3.1]. In the following, assuming fl20|) . we give a 
proof simpler than the one provided in P^ [T3] . 

Theorem 5. If Q '■ B{X) — > C{y) is a positive operator valued measure, 
then 



Kix,t)= / xit-x)dQix) (26) 

Jx 

is a translation invariant y- Mercer kernel on X . Conversely, if K is a trans- 
lation invariant y-Mercer kernel on X , then there exists a unique positive 
operator valued measure Q such that (l^6l) holds. 

We say that Q in (I26p is the positive operator valued measure associated 
to the translation invariant Mercer kernel K. 
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Proof of TheoremlB If Q '■ 'B(X) — > -^(3^) is a positive operator valued 
measure, by Neumark dilation theorem [21] there exist a separable Hilbert 
space H, a projection valued measure P : B{X) — ^ '^{'H) and a bounded 
operator A : 7Y — > y such that 

Q(Z) = AP{Z)A* yz G i3(X). (27) 

Let vr be the continuous unitary representation of X acting on 7i given by 

7i{x) = [ x{x)dP{x), (28) 



X 



see [in]. Eq. (1261) then becomes K{x, t) = A7rt_xA*, so that K is a translation 
invariant Mercer kernel by Proposition [12] and Lemma [H 
Conversely, by Proposition [12] and Lemma [1], every translation invariant Mer- 
cer kernel is of the form K{x, t) = AiTt-x^* for some continuous unitary 
representation tt of X in a separable Hilbert space Ti and some bounded 
operator A : Ti — > y. By SNAG theorem [16], there is then a projection 
valued measure P : B{X) — > ^0^) such that ( 128|) holds and ( l26l) follows 
defining the POVM Q as in (jST]). 
Finally, uniqueness of Q follows from 



{Ko{x)y,y') = x(x)dQj^y(x) = J^iQy,y'){x) 



X 



by injectivity of Fourier transform of measures on X. D 

The next proposition is a useful tool to construct translation invariant 
Mercer kernels. 

Theorem 6. Let be a measure on X and A : L'^{X, i);y) ^ y be a bounded 
operator. For all y,y' & y let 



{K{x,t)y,y') = / x{t-x){{A*y)ix),iA*y'){x)) du{x)- (29) 

Jx 

Then K is a translation invariant Mercer kernel and the corresponding re- 
producing kernel Hilbert space is embedded in L'^{X,i>;y) by means of the 
feature operator W : L'^{X, z>; y) -^ Hk 



{Wf){x) = Af- where f^{x) = x{x)f{x) (30) 

(^{Wf){x),y)= IW){f{x)AA*y){x)) dz>(x). 



Conversely, any translation invariant Mercer kernel is of the above form for 
some positive measure v and bounded operator A : I/^(X, i^; 3^) — »• y. 
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Proof. If i> is a measure on X and A : L'^{X, z); 3^) -^ 3^ is a bounded operator, 
then 

Q{Z)y, y') = J {{A*y) (x), {A*y') (x)) di){x) VZ G B{X), y,y'ey 

defines a positive operator valued measure Q : B{X) — > '^{y), since Q{Z) = 
AP{Z)A* where P{Z) is the muhiphcation by the characteristic function of 
Z. The kernel K given in (l29ll is then the translation invariant Mercer kernel 
associated to Q by fl2Ul) . To prove flHUl) . set 



7. : 3^ — ^ L'(X,z>;3^) (7.2/) (x) = xix)iA*y)ix), 

so that -ft'(x, t) = 7*7t and 



X 



X{x) (f{x),{A*y){x))dHx) = {Ar,y 



ioiaW f eL^{X,i);y). 

Conversely, assume that if is a translation invariant Mercer kernel. We first 
consider the case that y is infinite-dimensional. Propositions [TT] and IT7\ show 
that K is of the form K{x,t) = Arct-xA* for some unitary continuous repre- 
sentatijon vr acting on a separable Hilbert space Ti and a bounded operator 

A-.n^y. 

A basic result of commutative harmonic analysis (see [^) ensures that, for 
each nGN*:=NU{cxD}, there exist a complex separable Hilbert space 3^„ 
of dimension n, and a measurable subset X„ of X endowed with a positive 
measure z>„ such that the X„ are disjoint and cover X. Without loss of gen- 
erality, we can assume that z>„(X„) < 2~" and Voo{Xoo) < 1- Moreover there 
exists a unitary operator U : Ti ^ 0^ L^(X„, z>„, y„) such that 

(?77r,.?7*/„)(x) = x{x)Ux) fn e L2(X„,z>„,3^„) . 

For each n G N*, let Jn : 3^n ^ 3^ be a fixed isometry, which always exists 
since 3^ is infinite dimensional, and consider the Hilbert space L^(X,z>;3^), 
where z> = J2n ^"' which is a bounded measure by assumption on z>„. Define 
the isometry V : H -^ L'^{,X, z>; 3^) as 

{yu){x) = Jn{Uv){x) xex„. 

A simple calculation shows that 

TT^ = V*KV 
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where (A^/)(x) = x{^)f{x) is the diagonal representation on L^(X,z>;3^). 
Now 

K{x,t) = Ant-^A* = AV*Xt-^VA*. 

Redefining A = AV*, (l29l) is a consequence of the exphcit form of A^;. 

If y is finite dimensional, let {u, B) be the pair associated to K as in 
Proposition [13] below. Eq. ([29]) follows defining A : L'^{X,P,y) -^ y 



Af,y)= !{B{x)-^f{x).y) dz>(x). 



X 

n 



If 3^ = C™, K{x, t) can be regarded as a m x r/i-matrix and A is uniquely 
defined by a family of functions fi,...,fm G L^(X, z>; 3^) through A*ei = fi. 
Hence, IH^ becomes 



K{t~x),,= [x{t-x){f,{x),Mx))di){x) t,j = l,...,m. (31) 
Jx 

As an application, we give the following example that generalizes the one 
given in [5]. 

Example 10. Let X = W^, regarded as vector abelian group, and y = C™. 
The dual group is isomorphic to M*^ by means of Xp{x) = e*^'^^'^. Let = dp 
be the Lebesgue measure on M.'^ and 

1 _ 2 IpP 

fi{p) = (2ny/^ ^~"' ~ '"' ^'^ '^' ^' ^ ^' 

then the translation invariant Mercer kernel given by { \31\i is 
K{t-x),,= [ e''''^'-^^-mj{p),Mp))dp 

1 -2n 



;3-T2 



{af + a|)^/2 



e ' J {Vj,Vi) . 



The example in corresponds to the choice Vi = Vj and ai = Cj for any 
hi = l,...,m. 

Theorems [5| and [6] give two different characterizations of a translation 
invariant kernel K, but the POVM Q defining K through ([2U]) is always 
unique, whereas there are many pairs {v., A) defining the same K by (I29|) . 
These two descriptions are related observing that, given a pair (i^, A), the 
scalar bounded measure Qy,yi has density {{A*y){x)i {.A*y'){x)) with respect 
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to z> for any y, y' G 3^. On the other hand, given the POVM Q, let vq be the 
bounded positive measure defined by 

Pq{Z) = J2 2"" bnir'" (Q(^)Z/n, Vn) VZ G S(X) (32) 



where {yn}nen is a dense sequence in y. Clearly, given Z G B{X), vq{Z) = 
if and only if Q{Z) = 0, and i>Q is uniquely defined by Q up to an equivalence. 
Moreover, by Neumark dilation theorem, see (1271) . there exists an operator 
Aq : L^(X, i>q;3^) -^ y such that the pair (z>Q,y4Q) gives the kernel K 
associated with Q. 

We notice that in general it is not true that the POVM Q has an operator 
valued density. We recall that Q has operator density if there exists a map 
B : X — > '^{y) and a positive measure z/ such that {B{-)y,y') G L^(X,z>) 
for all y,y' E y and 

/ {B{x)y,y')du{x) = Qy,y'iZ) VZ g B{X). (33) 

Jz 

The following proposition will characterize the kernels having a POVM with 
an operator density. To prove the result, we need the following technical 
lemma. 

Lemma 2. Let O be a positive measure on X and B : X — > ^{y) such that 
{B{-)y,y') G L^(X, z>) for ally,y' G y. Then, the sesquilin ear form 

yxy^L\X,u), (y, y') ^ {B{-)y, y') (34) 

is continuous. 

Proof. For fixed y E y [resp. y' G y] the map y' i-^ {B{-)y,y') [resp. y i— > 
{B{-)y, y')] is continuous from 3^ into L^{X, v) by the closed graph theorem, 
i.e. the application defined in flM|) is separately continuous in y and y' . So, 
the closed graph theorem again assures the joint continuity. D 

Proposition 13. Let O be a positive measure on X and B : X — > ^(y) 
such that {B{-)y,y') G L^{X, P) for all y,y' E y and B{x) > for u-almost 
all X- Then 



K{x,t)= / x{t-x)B{x)dKx). (35) 

Jx 

is a translation invariant Mercer kernel, and the space Tix is embedded in 

L^(X, z>; y) by means of the feature operator 



iWf){x) = / xix)B{x)-^fix)dHx), (36) 

'X 
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where both the above integrals converge in the weak sense. 

If y is finite dimensional or X is compact, any translation invariant kernel 

is of the above form for some pair (z>, B). 

If y = C, one can always assume that B = 1 and D is a bounded positive 

measure. 

Proof. Let z/ and B as in the assumptions. Given a Borel subset Z oi X 
define Q{Z) as the unique bounded operator satisfying 

QiZ)y,y') = j_{B{x)y,y') duix)- 

The fact that Q{Z) is a bounded operator follows from Lemma [2] and from 
the continuity of the map L^(X,z>) 3 (p ^-^ /^0(x)dz>(x) G C. Clearly, 
Q{Z) is a positive operator and monotone convergence theorem implies that 
Z ^ Q{Z) is a POVM on X. By construction K{x,t) = Jj^ xit - x)dQ{x), 
so i^ is a translation invariant Mercer kernel by Theorem [5l Setting 

7. : 3^ — L\X, V- y) (7.2/) (x) = x(x)5(x)^/'2/, 

we see that K{x., t) = 7*7^ and 

[illy) = {f^i^y)^ = / {hx),x{x)B{xy^'y)di){x) 
W)(B{xf'^f{x).y)du{x) 



X 



for all / G L2(X, z>; 3^), from which dSHD follows. 

Assume now that y is finite dimensional or X is compact and i^ is a trans- 
lation invariant Mercer kernel. Theorem [5] ensures that there exists a POVM 
Q on X taking value in y such that K{x,t) = fj^ x(t ~ x)dQ{x)- If ^ is 
compact, X is discrete. Let z> be the counting measure and B{x) = Q{{x}) 
for all X £ -^; then (i>, B) satisfies the required properties. 
If y is finite dimensional, choose Oq as in (l32l) . It follows that for any 
y,y' G y, the complex measure Qy^y' has density by^y' G L^(X, z>q) with 
respect to z>q. In particular, by^y{x) > for z/g-almost all % G X. Let 
2/1, . . . , I/AT be a basis of 3^ and by linearity extend by-^y^ G L^{X, vq) to a map 
B : X —^ C{y), which clearly satisfies the required properties. 
If 3^ = C, the claim is clear. □ 



If 3^ = C, Proposition [12] is already given in 
We end by showing a sufficient condition ensuring that a translation in- 
variant Mercer kernel is of the form given in Proposition [131 
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Proposition 14. Let K be a translation invariant Mercer kernel. Suppose 
that {Ko{-)y,y') G L^{X,dx) for all y,y' e y. Let 

{B{x)y, y') := f x{x) {Koix)y, y') dx Vy, y' E y. (37) 

Jx 

Then 
(i) B[x) is a bounded nonnegative operator for all x ^ ^i 
(ii) {B{-)y, y') e L\X, dx) for all y, y' e y; 
(iii) for all x,t E X , 



K{x,t)= [x{t-x)B{x)dx, (3^ 

Jx 



where the integral converges in the weak sense. 

Proof. The operator -B(x) defined in ( 137|) is bounded as a consequence of 
Lemma [2] (applied to Kq) and of the continuity of the map L^{X, dx) 3 (p ^-^ 

^(0)(x)eC. 

Since {KQ{-)y,y) is a function of positive type, by Fourier inversion theorem 
{B{-)y,y)eL\X,dx),and 



{Koix)y,y)= / xi^) {B{x)y,y) dx, 
Jx 

which is (1381). □ 



5.2 Universality 

In this section we study the universahty problem for translation invariant ker- 
nels on an abelian group in terms of the characterization given by Theorem l5l 
and Proposition HSl The assumptions and notations are as in Section I5.1[ 
To state the following result, we recall that the support of a POVM Q is the 
complement of the largest open subset U such that Q{U) = 0. 

Proposition 15. Let K be a translation invariant Mercer kernel, and Q its 
associated positive operator valued measure. If the RKHS Tix is dense in 
L'^{X,fi; y) for any probability measure fi, then supp((5) = X . 

Proof. Suppose there is an open set ?7 C X such that Q{U) = 0. Let 
Xo G U, so that XoU~^ is a neighborhood of the identity element of X. Let 
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/i be a probability measurCl on X such that suppjF(/i) c XqU ^ and set 
v{^) = Xo{x)y with y E y\ {0}. Then (1^ gives 



{L^ip,ip)= / / x(i-a;)Xo(a;)xo(t)d(5j/,y(x)d^Wd^W 

Jx J X J X 

Jx 
This shows that L^ is not injective, i.e. K is not universal. D 

We now characterize the universality of the kernels defined in terms of 
the pair (z>, B) by means of (IH^ . 

Proposition 16. Given a positive measure i) on X and B : X — > ^{y) 
such that {B{-)y,y') G L^{X, u) for all y,y' ^ y and B{x) > for u-almost 
all Xj ^^t K he the translation invariant Mercer kernel given by i \35\) . 

(i) Ifhix is dense in L'^{X,fi; y) for any probability measure yU, then both 
supp = X and supp B = X . 

(ii) //supp = X and B{x) is injective for u-almost all x G X, then Tix 
is dense in L^(X, /i; 3^) for any probability measure /i. 
In the case X is compact also the converse holds true. 

(iii) If y = C and B = 1, Tix is dense in L'^{X,fi;y) for any probability 
measure fi if and only z/supp = X . 

Proof. Item (i) follows from Proposition [TSl and ( l33l) . 

Let now yU be a probability measure on X. Using (135!) . we have 




{L^ip, ^)=l 1 1 x{t - x) {BixMt), ^{x)) dz>(x)d/x(x)d/i(t) 

[Bixmw)ix-'),H^f^)ix-'))dHx)- (39) 



X 



(ii) If B{x) is injective for almost all x G X and supp i) = X, then, by 
the above equation, positivity of B{x) and the injectivity of Fourier 
transform, L^j^ip 7^ if </) 7^ in L'^{X, jj,; y). Therefore, Tix is dense in 
L^(X, /i; y) for any probability measure /i. 
Suppose X is compact, so that X is discrete. If TIk is dense in 



^For example, if V^ is a compact symmetric neighborood of the identity of X such 
that V^ C xviU^^, let h = Iv * Iv, so that (up to a constant) the measure d/i(a;) = 
T^^{h){x)Ax = \j-^^{\v){x)\ Ax has the required property. 
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L {X,fi;y) for any probability measure /i, supp i> = X by item (i). 
If Xo ^ -^ and y G keiB^xo): choose dfj,{x) = dx and ip{x) = Xo{^)yy 
so that J^{ipfi){x) = ^x,x~^ y' ^^ ^^^^ have 

(L^(^, ^) = {B{xo)y, y) i>{xo) = 0. 
Since L^ is injective, this imphes ip = 0, i.e. y = 0. 

(iii) Since B = 1, the 'if part is clear from item (ii). The converse follows 
by item (i). 

D 

By inspecting the proofs of Propositions [T^ and [TU| one can easily replace 
L^(X, /i; y) with any Ui^X, /x; 3^), 1 < p < cxd, in the statements. The same 
holds for Corollary [6] below. 

Remark 3. If the translation invariant kernel K is Cq, then Propositions [15] 
and [in] characterize universality of K. 

Remark 4. If X = W^, 3^ = C, and supp z/ is a subset of X = M*^ such 
that every entire function on C^ vanishing on it is identically zero, then K 
is K-universal (see [22^ Proposition 14]). This follows by fl39|) . taking into 
account that, for compactly supported /x, the Fourier transform of ip^i can 
be extended to an entire function defined on C^. 

In particular, if c? = 1 a sufficient condition for K-universality is that supp v 
has an accumulation point. 

Based on the above remark, we give another example of compact- universal 
kernel, which is not universal, see also Example [H] 

Example 11. Lei /C : R x R ^ C he the Co-kernel 

ir(x,t)= re^"(^-)Mp= ""';^^-"\ 
J-i n{t-x) 

with the restriction of the Lebesgue measure to [—1, 1]. Since the support of 
P admits an accumulation point, K is compact-universal by the last remark. 
On the other hand since supp i> is not the whole M, K is not universal by 
Proposition [73 

We now exhibit a particular case in which Proposition [16] applies. 

Corollary 6. Let K be a translation invariant Mercer kernel such that 
{Ko{-)y,y') e L\X,dx) for all y,y' e y. Let B : X — > C{y) be as %n 
P'T] ). If B{x) is injective for dx-almost all x, then the reproducing kernel 
Hilbert space Tix is dense in L^(X, /i; y) for any probability measure fi. 

Proof. Since the support of the Haar measure dx is X, the claim is then a 
consequence of Proposition [TH] □ 
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6 Examples of universal kernels 

In this section we present various examples of universal kernels, some of them 
has been already introduced in Section [3l 

We start with the gaussian kernel, which is a well known example of uni- 
versal kernel. The first proof about universality is given [SU] with a different 
technique and in [22] by means of the Fourier transform. In both paper only 
compact-universality is taken into account. 

Example 12. Let X be a closed subset ofM.'^, y = C and 

\\x-tf 

K{x,t) = e ia--! x,t E X, 

where a > 0. Then K is a Co-universal kernel. 

Proof. Assume first that X = M'', regarded as abelian group, then k is trans- 
lation invariant kernel with kq in Co(M'^) fl L^(R'^,(ix). According to ( 1371) 



B{p) = ^/{27ia^Ye-^^''' 



M 



where the dual group is identified with R'^ by means of Xp{^) = e*^'^^'^. Since 

B[p) > for all p eW^, universality is a consequence of Corollary [6l 

If X is an arbitrary closed subset of Mf^ it is enough to apply Corollary [3l D 

Next example is well known in functional analysis (see, for example, [3]). 

Example 13. Let X = R, y = C and let 

K{x,t) = e-"l^-*l. 

Then the kernel k is a Co-universal kernel and Ti^ = W^ (M), the Sobolev 
space of measurable complex functions f onK. with finite norm 



2 

X 



\f{x)\' + \f{x)t 



dx, 



where f is the weak derivative. 



Proof. The same reasoning as above, observing that B{p) = ^ "^^ ^ > for 

all P e M. TT TTp ^ 

Next example characterizes universal kernels of the form K = nB - see 
Example [5l 
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Example 14. Let k be a Co-scalar reproducing kernel and B a positive op- 
erator. The kernel K = kB is universal if and only if n is universal and B 
is infective . 

Proof. We have to show that, given a probabihty measure /i, TYk^ is dense 
in L^(X, /i;3^). The space TL^b is unitarily equivalent to Ti^ ® keri?^ by 
means of W{ip ® y){x) = ip{x)B^y, see Example [51 Hence, it is enough to 
prove that Ti^ ® B^y is dense in L'^{X, /i) ® y. This is the case if and only 
if TYk is dense in L'^{X, /j,) and B2 has dense range, and this last condition is 
equivalent to the fact that B is injective since i? is a positive operator. D 

The same result holds replacing Co-kernel with Mercer kernel and univer- 
sality with compact-universality. 

Example 15. Let k : X x X ^ C and k' : X' x X' ^ C be two scalar Cq 
reproducing kernels on X and X' , respectively. Let V be the identity operator 
on l-i^i . 

(i) The l-L^t -kernel K = nI' if universal if and only if n is universal. 

(ii) Fixed a probability measure fi' on X' , the L'^{X', n') -kernel K = nL^r 
is universal if and only if k, is universal andTi^' is dense in L'^{X',fi'). 

(iii) The scalar kernel k, x n' is universal if both k and k' are universal. 

Proof. Items (i) and (ii) follow immediately from Example [TH and Propo- 
sition [TUl Item (iii) is a consequence of Proposition and the density of 
Co{X) ® Co(X') in Co{X x X'). D 

The following class of examples is considered in [H] . 

Example 16. Let X be a locally compact second countable abelian group. 
Let {-B*}j=i be a finite set of positive operators on y and {kq}^^-^ be a finite 
set of scalar functions of positive type in Cq{X) fl L^{X, dx). The translation 
invariant kernel K 

N 

K{x,t) = J]4(a;-i)^' 

i=l 

is universal provided that fljkeri?* = {0} and, for each i = 1, . . . N, there is 
an open dense subset Z* C X such that J^(ko) > on Z^. 
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Proof. Clearly, {Ko{-)y,y') is in L^{X,dx). Moreover, according to flH7|) . for 
ally ey and x^ ^ 



N 



B{x)y = J2^K)(x~')B'y- 



i=l 



Each Z* is open and dense, hence Z = CiZ^ is dense in X. Let x ^ ^ ^ind 
y & y such that B{x)y = 0; then B'^y = for alH = 1, . . . , A^, since every 
B^ is a positive operator and J-'{kq) > on Z*, so that by assumption y = 0. 
Therefore, K is universal by Corollary [6l D 

A Vector valued measures 

In this appendix we describe the dual of Co{X; y). For 3^ = C, it is a well 
known result that Co{X)* can be identified with the Banach space of complex 
measures on X. For arbitrary 3^, a similar result holds by considering the 
space of vector measures. If X is compact, this result is due to [28] and we 
slightly extend it to X being only locally compact. The proof we give is 
simpler than the original one also for X compact. 

Moreover, by using a version of Radon-Nikodym theorem for vector valued 
measures, it is possible to describe the dual of Co(X;3^) in a simpler way. 
Indeed, the following result holds. 

Theorem 7. Let T G Co(X;3^)*. There exists a unique probability measure 
fi on X and a unique function h G L°°{X,^; y) such that 

nf) = f {fix), h{x)) d/i(x) / G Co(X; y) (40) 

Jx 

with \\h{x)\\ = \\T\\ for fi- almost all x G X. 

Proof. It follows combining Theorems [8] and [9] below. D 

Observe that, given /i and h as in the statement of the theorem, if we 
define T by (l40l) . then T G Co{X;y). Hence (HOl) completely characterizes 
the dual of Co(X; 3^) in terms of pairs (yU, h). 

To prove the theorem, we recall some basic facts from the theory of vector 
valued measures (see [Ullin])- If A G B{X), we denote by Il{A) the family 
of partitions of A into finite or denumerable disjoint Borel subsets. 

Definition 3. A vector measure on X with values in y is a mapping M : 
B{X) — > y such that 
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(i) 

sup V||M(A,)|| <oo; 

{A,}GU{X) ^ 

(ii) jor all A G B{X) and {Ai} G U{A) 

M{A) = Y,M{Ai) 

i 

where the sum converges absolutely by item (i). 
If M is a 3^- valued vector measure on X, for all A G B{X) we define 

\M\{A)= sup 5^||M(A,)||. 

{A,}eU{A) ^ 

Then, |M| is a bounded positive measure on X, called the total variation of 
M. 

The integration of a function f & L} (X, | M | ; 3^) with respect to M is 
defined as it follows. Let St(X; 3^) be the space of functions / = ^11=1 Ia.^j, 
with Ai disjoint Borel sets and fj G 3^ (1a is the characteristic function of 
the set A). For such /'s, define 

» n 

/ (/(x),dM(a;)):=5^(i;„M(A,)). (41) 

Since 



E(^-M(A,)) 



1=1 



< 



J2M\mA^)\\<J2MiA 



1 ' 



the integral ( HTl) extends to a bounded functional on L^{X, |M|; 3^), which is 
denoted again by J^ {f{x), dM(x)). By Theorem 4.1 in [19], then there exists 
/i G L°°(X, |M|;3^) such that 

(/(x),dM(x))= [{f{x),h{x))d\M\{x) yfeL\X,\M\;y), 



and ||/i(a;)|| = 1 for |M|-almost all x. These facts are collected in the following 
theorem. 

Theorem 8 (Radon-Nikodym). If M is a y -valued vector measure on X, 
there exists a unique \M\-measurable function h : X — > y such that \\h{x)\\ = 
1 for \M\-almost all x and 

(/(x),dM(x))= [ {fix),hix))d\M\ix) yfeL\X,\M\;y). 
Jx 
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The function h is called the density of M with respect to | M | . 
We denote by M(X; y) the space of 3^-valued vector measures on X. The 
space M{X] y) is a Banach space with respect to the norm 

||M|| = |M|(X) 

(see [n]). If 3^ = C, we let M(X) = M(X; C). The next duality theorem is 
shown in [28] for X compact - see also [IT] . 

Theorem 9. If Cq{X\ y) is endowed with the Banach space topology induced 
by the uniform norm, then Co{X; y)* = M{X; y), the duality being given by 

(/,M)= / (/(x),dM(x)) yfeCoiX;y),MeMiX;y). 

Jx 
Proof By Theorem [HI it is clear that, if M G M(X; y), then 

TM)= [ (/(x),dM(x))= / (/(x),/i(x))d|M|(x) 
Jx Jx 

defines a bounded functional Tm on Co{X; y). 

Clearly ||Tm|| < ||M||. To show that ||Tm|| = ||M||, fix by Lusin theorem 
a function g G Co(X;3^) such that g{x) = h{x) ioi x E X \ Z, Z being a 
I M [-measurable set with |M|(Z) < e, and \\g\\^ < II^IIimIoo ~ -'-• ^'^^ ^ small 
enough, we then have 



|M|(X) -2e < \M\{X\Z) - \M\{Z) < 



/ {9ix),h{x)) 
Jx 



d\W\\{x) 



< |M|(X). 



This shows that ||Tm|| = ||M||. 

Suppose now T G Co{X; y)*. For v & y, let i^ : Co(X) — ^ Co{X; y) be 
the bounded operator given by 

[iv{'f)]{x) ='f{x)v. 

Since Ti^ G Co(X)*, by Riesz theorem there exists a measure /i^, G M{X) 
such that 



Tijj{ip) = / ip{x)djj,t;{x) and II^^dII = ll/^i: 

Jx 

For all A G B{X), let M{A) be the vector in y such that 
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(M(y4) is well defined, since |yUt,(yl)| < ||yU.„|| = ||Ti^|| < ||T|| ||f ||). 
We now show that, if A G B{X) and {Ai} e n(y4), then 

< 'I 

so that item (i) of Definition [3] holds. It is enough to prove it for all finite 
partitions {v4j}j=i...n- Let Wj = M(y4j)/ ||M(y4j)|| (we set Wj = whenever 
U{Ai) = 0). We have 

Y,^ ||M(A,)|| = Y,^ {v., M(A,)) = Y^fi^XA). 

Set u = J2i \f^Vi\, which is i^ a bounded positive measure, and every fi^- has 
density with respect to u. For alH = 1 . . . n, fix a sequence {fjjj&N in Cc{X) 
such that liiajifj (x) = lAiix) for //-almost all x. Define 



ijJx) 



ivx:i^fw 



k=l 



Yv^j\x)vi- 



i=l 



Then, ipj G Cc(X; 3^), and ||'?/'j(x)|| < 1 for all x. Moreover, 



n 


1 "^ 


iv^^fW 


^fix) 


L k=l 


■ 



< 1 Vx, i 



and 



lim 

j 



ivx:i^fw 



k=l 



-1 



Si) 



(Pj (x) = lAi{x) for zz-almost all x. 



Therefore 

\J2.\mA)\\-T^p, 



Y- { ^^^^(A) ~^^^> 



ivE 



V 



(fc) 



< 



E, 



U.(x) 



X 



IvJ: V.f(x) 



-1 



V 



.(0 



(i) 



ip)'{x) >dn^,{x) 



]-*QO 
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by dominated convergence theorem. On the other hand, {Tipjl < \\T\\ \\'ipj\\^ < 
||T||. It follows that Yl^=i ll'^(^i)ll — ll^ll) ^s claimed. 
We now show that 

M(A) = 5^M(A,) 

i 

(absolutely) for all A G B{X) and {Ai} G n(A). We have just proved that 
the right hand side is absolutely convergent, and the equality follows by 

;, Y, M(^.)\ = 5Z/i.(^.) = f^v{A) = {v, M{A)) \fv G 3^. 

i I i 

Therefore, M is a 3^- valued measure. It remains to show that T = Tm. 
Let h and |M| be associated to M as in Radon-Nikodym theorem. Then, for 
any Borel set A C X, we have ^v{A) = J^ {v , h{x)) d\M\{x) , from which it 
follows that /it, has density {v, h{x)) with respect to |M|. For (/? G Cc{X) and 
f G 3^, we thus have 



T{(pv) = / (p{x)dij,y{x) = / {ip{x)v,h{x))d\M\{x) =TMi(pv). 
J X Jx 

Then, T = Tm by density of Cc{X) ^y in Co{X; y). D 
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